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ABSTRACT 

Existing studies on differential privacy mainly consider aggregation 
on data sets where each entry corresponds to a particular participant 
to be protected. In many situations, a user may pose a relational 
algebra query on a sensitive database, and desires differentially pri- 
vate aggregation on the result of the query. However, no known 
work is capable to release this kind of aggregation when the query 
contains unrestricted join operations. This severely limits the ap- 
plications of existing differential privacy techniques because many 
data analysis tasks require unrestricted joins. One example is sub- 
graph counting on a graph. Existing methods for differentially pri- 
vate subgraph counting address only edge differential privacy and 
are subject to very simple subgraphs. Before this work, whether 
any nontrivial graph statistics can be released with reasonable ac- 
curacy under node differential privacy is still an open problem. 

In this paper, we propose a novel differentially private mech- 
anism to release an approximation to a linear statistic of the re- 
sult of some positive relational algebra calculation over a sensitive 
database. Unrestricted joins are supported in our mechanism. The 
error bound of the approximate answer is roughly proportional to 
the empirical sensitivity of the query — a new notion that measures 
the maximum possible change to the query answer when a partici- 
pant withdraws its data from the sensitive database. For subgraph 
counting, our mechanism provides the first solution to achieve node 
differential privacy, for any kind of subgraphs. 

1. INTRODUCTION 

An important task in data privacy research is to develop mecha- 
nisms to publish useful results mined from sensitive database, with- 
out disclosing individual privacy. Most of existing techniques pro- 
vide rather limited privacy protection, since they usually address 
specific attack models, or rely on specific assumptions about the 
prior knowledge the potential adversary may possess. In recent 
years, the paradigm of differential privacy has received increasing 
attention, because it can provide robust and quantitative privacy 
guarantee while making no assumptions about the prior knowledge 
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of the adversary. Data publishing algorithms that achieve differen- 
tial privacy should guarantee that their outputs are randomized such 
that input databases differing in one participant are almost indistin- 
guishable to the adversary. Therefore, participating in a database is 
unlikely to cause privacy breach. 

Existing studies on differential privacy are mainly based on a 
simple data model, where the input database is a set of records, and 
each record corresponds to a participant. The output of a differen- 
tially private data publishing algorithm should have almost identi- 
cal probability distributions for input data sets that differ in exactly 
one record. Various kinds of queries that compute aggregations on 
data sets have been considered, and much effort has been put to 
linear aggregations, on which more complex queries can be built. 

The success of most existing differentially private mechanisms 
relies on the precondition that the maximum possible change to the 
query answer resulted from the change of one participant should be 
small and bounded. Such maximum possible change is called the 
sensitivity of the query, which determines the minimum magnitude 
of noise needed to introduce into the answer. In practice, how- 
ever, many databases contain information about not only individ- 
ual participants, but also relationships between them. The change 
of one participant may, in the worst case, have potentially unlim- 
ited impact on the database and the query answer. Queries on such 
databases are too complex to be tackled by existing techniques. In 
this paper, we try to relax the precondition by allowing potentially 
unbounded impact that may be incurred by new participants joining 
the database, and give an elegant solution. 

1.1 Motivation 

Subgraph counting is an important problem in data mining and 
social networks, which counts the number of occurrences of a given 
query subgraph in an input graph. Despite of the enormous works 
on anonymization schemes for private graphs, little has been down 
to provide quantitative guarantees of privacy and utility. In |12| , 
subgraph counting is studied under a much weaker version of dif- 
ferential privacy. Their privacy guarantee protects only against a 
specific class of adversaries. The error of the approximate answer 
returned by their algorithm is large — the magnitude of noise grows 
exponentially with the number of edges in the subgraph. In llO) 
and (T), fc-triangle and fc-star counting are studied, and they achieve 
better privacy and utility guarantee. In particular, they achieve e- 
differential privacy for fc-star counting, and (e, (5) -differential pri- 
vacy, a weaker version of differential privacy, for fc-triangle count- 
ing. However, their work cannot be extended to other kinds of sub- 
graph. It is also worthy of mentioning the work in [5 J, which gives 
an algorithm for releasing an approximation to the degree distribu- 
tion of a graph and achieves fc-edge differential privacy. 

A major problem of the above works is that they can only achieve 
edge privacy — each edge corresponds to a participant to be pro- 



tected. But for many real-world data, such as social networks, each 
individual participant contributes to the graph a node rather than 
just an edge. We desire privacy protection based on nodes rather 
than edges. Unfortunately, it is difficult to achieve node differen- 
tial privacy while obtaining reasonable query accuracy, because the 
maximum possible change to the query answer resulted from the 
change of one node (as well as all of its incident edges) is compa- 
rable to the graph size. Prior to our work, whether any nontrivial 
graph statistics can be released under node differential privacy with 
reasonable accuracy is still an open problem |7|. It was widely be- 
lieved that algorithms achieving node differential privacy can only 
return query answers that are too noisy for practical applications 18, 
[5|. In this paper, we try to challenge this seemingly impossible task 
and give a general solution. 

In reality, databases usually consist of a number of tables. A par- 
ticipant may contribute tuples to several tables, and a tuple can be 
contributed collectively by multiple participants. A user may want 
to issue a SQL query to the database to obtain an output table, then 
requests approximate statistic of the output table. Subgraph count- 
ing is, in fact, a special case of this general context, because every 
subgraph count can be written as a SELECT query. It will be quite 
useful if this kind of task can be solved under differential privacy. 
There have been at least two attempts in the literature |9 , 11], which 
are based on bounding the global sensitivity of the query. How- 
ever, these works support only restricted kinds of join operations, 
where one participant can affect only constant number of tuples in 
the output table. Even the most simple subgraph counting requires 
unrestricted joins where a participant can have unbounded impact 
on the query answer. Obviously, existing methods are unable to 
support this kind of joins. 

We focuses on the case where the SQL query can be translated 
into a series of positive relational algebra calculation. We aim at 
releasing an approximation to a linear statistic of the output table 
with reasonable accuracy under differential privacy. Our solution 
covers subgraph counting. Both node and edge differential privacy 
are achievable, depending on the choice of user. Node differential 
privacy is stronger than edge differential privacy, but the latter can 
allow better query accuracy. When nodes or edges of the graph 
are associated with auxiliary information, our solution also allows 
arbitrary kinds of constraints imposed on any edges or nodes of the 
subgraph, which are not supported by prior works. 

1.2 Contributions 

To develop differentially private mechanisms that can support 
unrestricted joins, we face several difficulties. First, the problem 
we study allows one participant to have complex impact on the 
database. The data model assumed by existing differentially private 
mechanisms is too simple to suffice our need to express the com- 
plex relations between the database and the participants. Hence, 
new data model is needed to express how participants affect the 
database content. Second, existing notions of sensitivity, including 
global and local sensitivity, are no longer appropriate in our case, 
because a new participant joining the database can, in the worst 
case, have unlimited impact on the query answer, leading to un- 
bounded sensitivity. Thus, it is impossible for us to calibrate the 
noise to such sensitivities. We need a new metric to measure the 
least magnitude of noise that is necessary to answer a query. Third, 
existing works for complex queries often compromise privacy guar- 
antee, utility guarantee or efficiency guarantee. However, such 
compromise can lead to severe problem for practical use, which 
limit the applications of those techniques. It is a challenging task 
to develop mechanisms that can achieve all three guarantees. 

Contributions of this paper are as follows: 



1) We propose a general model of sensitive databases, which 
allows one participant to affect the database content in any possible 
way. By formalizing the definition of neighborhood, the notion of 
differential privacy on this data model is setting up such that privacy 
protection is based on individual participants. 

2) We propose a new notion of sensitivity, called empirical sen- 
sitivity, that measures the maximum possible change to the query 
answer when a participant withdraws its data from the current 
database content. Empirical sensitivity is always bounded, and is 
often small. It gives a better measure of the least magnitude of 
noise that is necessary to answer a query. 

3) We develop a general but inefficient mechanism to answer any 
monotonic query on a sensitive database. This mechanism guaran- 
tees e-differential privacy, and the error bound is roughly propor- 
tional to the global empirical sensitivity of the query. 

4) We propose a specific model of sensitive databases based on 
/iT-relation or c- table. Every tuple in a A"-relation is annotated with 
a positive Boolean expression that specifies its condition of pres- 
ence. A'-relation is closed under positive relational algebra cal- 
culation. Hence it can be used to express the complex relations 
between the participants and the table output by a SQL query. 

5) We develop an efficient mechanism to answer any linear query 
to a sensitive iiT-relation. This mechanism guarantees e-differential 
privacy, and the error bound is roughly proportional to the univer- 
sal empirical sensitivity of the query. The computation cost is in 
a polynomial of the size of A'-relation. Our mechanism is the first 
solution to the problem of subgraph counting for any subgraphs, 
which can achieve either node differential privacy or edge differen- 
tial privacy, and the error bound is roughly proportional to the local 
empirical sensitivity of the query. 

6) We conduct extensive experiments to evaluate the proposed 
mechanism. Experimental results validate the effectiveness and ef- 
ficiency of the new mechanism. 

In Fig. [T] we present a brief comparison between our mechanism 
and existing mechanisms. 

2. PRELIMINARIES 
2.1 Privacy and Utility 

In this work, we will use differential privacy |2|, a state-of-the- 
art paradigm for privacy preserving data publishing. A randomized 
algorithm is differentially private if it yields nearly identical distri- 
butions over its outcomes when running on neighboring databases. 

Definition 1 (Differential Privacy). A randomized 
algorithm A. is (e, 5)-dijferentially private if for any pair of 
neighboring databases D, D , and for any set of possible outputs 
S C Range{A), 



Pr[yt(D) eS]<e ■ Vr[A{D') eS\+5 



(1) 



where the probability is taken over the randomness of A. When 
(5 = 0, the algorithm is (.-differentially private. 

All algorithms presented in this paper satisfy e-differential privacy. 
The definition of neighboring depends on the context or appli- 
cation. Usually, D and D' are said to be neighboring if they differ 
only by one participant. In this case, a differentially private algo- 
rithm can protect against disclosure of any participant. In the liter- 
ature, the database D is often considered as a multiset of records, 
where each record corresponds to a particular participant, then D 
and D' are neighboring if \D - D'\ + \D' - D| = 1. 



Queries 



Our mechanism 



Existing meciianisms 



Monotonic query on a sensitive database 



0{GSq/e) error, Exp(|P|) time 



None 



0(USq/e) error and 0(1) time if there are 
no unrestricted joins (9l|ll| 
Not solvable if there are unrestricted joins be- 
cause U Sq > GSq = +00 



Linear statistic of the output of a SQL query 



0{USq/e) error, Poly(|P|, l-R]) time 



0{LSq/e + XjF) error; 0(\V\ ■ \E\) time; 
only achieve differential privacy based on 
edges flOl 



triangle counting (c^-^) 



0{LSq/e) error, Poly(fc, |-R|) time 



fc-star counting (e.g., 3-star c/''\j ) 



0{LSq/e) error, Poly(fc, |_R|) time or 
Poly(|F|,|S|,fc)time 



0{LSq/e) error if l/e = 0{dn,^^/k)- 
0(\V\ ■ \E\) time; only achieve differential 
privacy based on edges |7| 



fc-triangle counting (e.g., 3-triangle ''\^S^ ) 



0{LSq/e) error, Poly(fc, |7?|) time or 
Poly(|l/|,|£;|,fc) time 



0{LSq/e) error if ln(l/5)/e = 0(an,ax); 
0(|V| ■ \E\) time; only achieve (e, 5)- 
differential privacy based on edges 1 7 1 



Q{{kf\og\V\y-^/e) error; 0(l) time; 
only achieve adversary privacy based on 
edges w.r.t. a specific class of adver- 
saries 1 121 



fc-node Z-edge connected subgraph counting 



0{LSq/e) error, ¥o\y{k,l, \R\) time 



Figure 1: Comparison between our mechanism and existing mechanisms. O means that logarithmic factors are omitted. For 
sensitive database, \P\ denotes the number of participants and supp(i?) denotes the number of tuples returned by the SQL query. 
For subgraph counting, \V\ and \E\ denote the number of nodes and edges in the graph, and |-R| = | supp(/J)| denotes the true query 
answer, dmax denotes the maximum degree of a node, and amax denotes the maximum number of common neighbors of a pair of 
nodes. GS, LS, US, GS, LS and US are explained in Secjijand Sec.|3] We have LSq < LSq and USq < USq. Note that we do 
not take account of the time needed for generating the output table or the list of matched subgraphs in the computation cost. For 
subgraph counting our solution can achieve differential privacy based on either nodes or edges, depending on the choice of user. 



We are interested in queries that are real-valued functions of the 
database (though other kinds of queries are also important). A dif- 
ferentially private algorithm must introduce randomness to its out- 
put, and the answer is never exact. The utility of the algorithm is 
measured by how accurate its answer is. 

Definition 2 ((e, 5)-Accurate). For a database D, a 
query q and the true answer q{D), we say that the answer returned 
by an algorithm A. is (e, S)-accurate if 



Pr[\A{D) - q{D)\ >e]<5 



(2) 



2.2 Global Sensitivity 

A well known approach to achieve differential privacy is Laplace 
mechanism [2], which introduces i.i.d. noises into the query an- 
swers. The magnitude of noise is calibrated to the sensitivity of 
the query — a property of the query that measures the maximum 
possible change to the true answer caused by a small change in the 
database. 

Definition 3 (Global Sensitivity). For a real-valued 
function q : D — )■ R"\ the (global) sensitivity of q is 



follow Laplace distribution Lap(GS'q/e) 
probability density function 



which has the following 



Lap(2/j6) = ^exp(-M) 



(4) 



Laplace mechanism satisfies e-differential privacy. It is easy to 
verify that Laplace mechanism returns {cGSq/e, e^'^) -accurate an- 
swer to each query in the sequence g, for any c > 0. 

2.3 Local Sensitivity and Smooth Sensitivity 

In Laplace mechanism, the magnitude of noise depends on GSq 
and the parameter e, but not on the database D. Since the global 
sensitivity GSq measures the impact of a participant on the true 
answer in the worst case, this often introduces unnecessarily large 
noise. In 1 10|, a local measure of sensitivity was proposed 

Definition 4 (Local Sensitivity). For a real-valued 
function g : D — >■ R™ and a database D G D, the local sensitivity 
of q at D is 



LSq(D) ^ max\\q{D) - q{D')\\i 
where the maximum is taken over the neighborhood of D. 



(5) 



GSq= max \\q{D) - q{D')\\^ 

D,D' evi 



(3) 



where the maximum is taken over all pairs of neighboring 
databases D, D . 



Given a database _D G D, a query sequence g : D — >■ R™ 
and a parameter e > 0, Laplace mechanism A returns A{D) — 
q{D) + (Fi, . . . , Ym), where Yi are i.i.d. random variables that 



Observing that GSq — maxn LSq{D), we know that LSq{D) 
never exceeds GSq. Ideally, we would like to release q{D) with 
noise magnitude proportional to LSq{D), but the noise magnitude 
might leak information and differential privacy is not satisfied. | lOl 
proposed that the noise magnitude should be calibrated to a smooth 
upper bound S on the local sensitivity, namely, a function S that is 
an upper bound on LSf at all point and such that ln(5'(-)) has low 
global sensitivity. [10| presents algorithms to compute the optimal 
S, called the smooth sensitivity of q, for a variety of queries. 



2.4 7<^-Relation and c-Table 

Our work addresses aggregation on relations where each tuple 
could be contributed by multiple participants, and each participant 
could contribute multiple tuples. To track which participants con- 
tribute a tuple and how they contribute, we use A'-relation |4| or 
c- table 1 6 1, a model proposed in the field of uncertain databases, 
where tuples are annotated (tagged) with their provenance informa- 
tion, and positive relational algebra is generalized to such tagged- 
tuple relations. Here we briefly review .ft'-relation and c-table. 

Let (/ be a finite set of attributes and C a domain of values, then 
each tuple is a function f :[/—)■ C The set of all such L*'-tuples is 
denoted by U -Tup. Relations without annotations are just subsets 
of t/-Tup. Tuples in a isT-relation are annotated with elements 
from a semiring {K, +, •, 0, 1). A ii'-relation over [/ is a function 
R : [/-Tup ->■ K with a finite support supp(7?) = {t\R{t) / 
0}. The operations of positive algebra on A'-relation are defined as 
follows |4|: 



empty relation For any set of attributes U, there is ( 
K such that 0(t) = for all t. 



U -Tup 



union For Ri , R2 
defined by 



U -Tup -^ K,RiUR2 ■■ U -Tup -i- K is 



{RiUR2){t) ^Ri{t) + R2{t) 

projection ¥01 R : U -Tup -^ AT and V C [/, nvR ■ V -Tup 
K is defined by 



{■KvR.){t) 



E 



R{t') 



t = t' onV and R(t') jt Q 



selection For R : [/-Tup -^ K and a selection predicate P : 
U -Tup -^ {0, 1}, apR : U -Tup -> A" is defined by 

iapR){t) = R{t) ■ P{t) 

natural join For Ri : Ui -Tup — >• K, i = 1,2, Ri txi R2 : 
(U1UU2) -Tup -^ Kis defined by 

iRi^R2){t)^Ri{ti)-R2{t2) 

where ii = f on Ui and t2 = i on f/2. 

renaming For R : [/-Tup — >• K and a bijection (3 : U -^ U', 
PfjR: U' -Tup -^ K is defined by 

{ppR){t) = R{tol3) 

Intersection and cartesian product are just special cases of natural 
join. But difference is not supported in positive relational algebra. 

We study differentially private aggregation on a c-table, which is 
a special case of ii'-relation where K makes up of positive Boolean 
expressions over some set B of variables. The lerm positive means 
that the expressions do not involve negation (^), but only disjunc- 
tion (V), conjunction (A) and constants True and False. In our 
work, each variable in B may correspond to a (potential) partici- 
pant being protected, then the Boolean expression annotated with a 
tuple t gives the condition of t being presented in the relation when 
some participants may opt out. 

In c-table or A'-relation, expressions that yield the same truth- 
value for all valuation of variables in B are considered equiv- 
alent. But this is not applicable to our work. An expression 
(bi V b2) A (61 V 63) cannot be simply rewritten into 61 V (62 A 63). 
Such rewriting could make our mechanism fail to satisfy differen- 
tial privacy. We will review this issue later. 



3. PROBLEM FORMULATION 

3.1 Sensitive Databases and Monotonic 
Queries 

In the literature of differential privacy, a sensitive database is 
typically considered as a multiset of records, and the privacy is de- 
fined by the indistinguishability between data sets that differ by 
only one record. But this definition of privacy is no longer appro- 
priate in our case, where each participant could have complex effect 
on the database. To achieve differential privacy in our setting, we 
need to know about not only the content of the database, but also 
how it changes if some participants withdraw their data. A sen- 
sitive database being released should contain such self-descriptive 
information. We propose a new definition of sensitive database, as 
below, which is more general. 

Definitions (Sensitive Database). A sensitive 

database is an ordered pair {P, M), where P is finite set 
of participants contributing the data, and M is a function 
M : 'P{P) -^ D such that M{P') is the content of the database if 
only participants in P contribute their data. 

Once sensitive databases are formalized, we are ready to adapt 
the notion of differential privacy to them by making clear what sen- 
sitive databases are considered neighboring with each other. We say 
that two sensitive databases are neighboring if one database can be 
obtained from the other by one participant withdrawing its data. 

Definition 6 (Neighboring). Two sensitive databases 

(Pi , Ml ) and (P2 , M2 ) are neighboring ;/ 1 Pi - P2 1 + 1 P2 - A | = 
1 and Mi(P') = M2{P')for all P' C Pi n P2. 

Definition 7 (Ancestor) . We say that (Pi , Mi ) is an an- 
cestor o/(P2, M2), denoted by (Pi, Mi) X (P2, M2), ;/Pi C P2 
andMi{P') = M2{P')forallP' C Pi. 

We postulate a class Q. of sensitive databases, such that every 
possible sensitive database being considered is an element of fJ. 
Moreover, if (P, M) G Q,, then all ancestors of (P, M) are also 
elements of fi. We make a further assumption that there is a spe- 
cial element Do in D such that A/(0) = Do for all (P, M) £ Q, 
(otherwise, Q. comprises disconnected parts). 

For a sensitive database (P, M), a query q takes as input M{P), 
the current content of the database, and outputs q(M{P)). In this 
paper, we address queries that output a real number and are mono- 
tonic. 

Definition 8 (Monotonic Query). For a class fl of sen- 
sitive databases, a query g : D — >■ R ;.s monotonic if both of follow- 
ing hold: 

• q{Do) = 

• g(Mi(Pi)) < q{M2{P2))forall{Pi,Mi) ^ (P2,M2) 

If the global sensitivity of a query is low, then Laplace mech- 
anism can still be applied to obtaining differentially private an- 
swer with reasonable accuracy. In many applications, however, 
the change of a participant could, in the worst case, incur exces- 
sive or even unlimited impact on the database content as well as 
the query answer. No existing differentially private techniques can 
process queries with unbounded global/local sensitivity. Hence, 
global/local sensitivity is no longer an appropriate quantity to mea- 
sure the necessary amount of noise introduced into the query an- 
swer. We propose a new notion of sensitivity, empirical sensitivity, 
which suffices our need. 



Dehnition 9 (Local Empirical Sensitivity). For 
a real-valued function g : D — > R™ and a sensitive database 
{P, M), the local empirical sensitivity ofq at {P, M) is 

rS,{P, M) = max ||g(M(P)) - g(M(P - M))|li (6) 

IfP = ^, then LSg{P, M) = 0. 

Definition 10 (Global Empirical Sensitivity). For 
a real-valued function g : D — >■ R"* and a sensitive database 
(P, M), the global empirical sensitivity of q at {P, M) is 



GSgiP,A4) 



max LSJP',M') 

{P' ,M')^{P,M) 



(7) 



Empirical sensitivity measures the maximum possible change 
to the query answer when a participant opts out. It is obvious 

ihaiLSg{P,M) < LSq{M{P)) < GSq wA LSq{P,M) < 
GSq{P,M) <GSg. 

3.2 Linear Queries on Sensitive Relations 

Although the model of sensitive databases and monotonic 
queries is general, it may be too general to allow efficient mech- 
anism for obtaining differentially private answer. We are in partic- 
ular interested in a special class of monotonic queries that compute 
linear aggregation on a relation, and the relation is itself a function 
of the sensitive database. 

Definition 11. A linear query q on sensitive database is a 
function q ~ q+ o q,, where g, : D — >■ ■p([/-Tup) and q+ : 
ViU -Tup) — >■ R, such that q, transforms a database D £ B into 
a finite set of tuples (e.g. by some relational algebra calculation), 
and q+ is a linear function: q+{T) — X^tgr 9+(^)- 

Note that the output of g, must be finite, although the space 
U -Tup can be infinite. 

To ensure that a linear query q is monotonic, we pose some lim- 
itations on the functions g* and q+. First, we require that intro- 
ducing a new participant into a sensitive database never results in 
removal of any tuple from the relation output by g* . Second, we 
assume that g+ is nonnegative. 

Definition 12. A linear query q = g^ o g, on sensitive 
database is monotonic if the following hold: 

. g.(Mi(Pi)) C q,\M2{P2)) for all {Pi,lVh) r< {P2,M2) 

• <1+(T) > for all finite T (1 (7 -Tup 

If we want to answer a linear function g+ that may yield neg- 
ative output, we can decompose it into two nonnegative compo- 
nents and compute them individually: g+(i) = max(0, g+(i)) — 
max(0, — g+(t)). 

Because we focus on a single query, where g, is fixed, we can 
construct a class of virtual sensitive databases Q' = {(P, A/')}, 
such that each (P, M) in $1 is mapped into a virtual one (P, M') 
where M' = g« o M. Then M'[P) is a set of tuples and the query 
g = g+ o g« is just a linear function that computes q+{M'{P)). 
The monotonicity of g, transmits to the monotonicity of A/'. We 
call such (P, M') a sensitive relation. 

Definition 13. A sensitive relation {P,M) is a sensitive 
database with M : P(P) -^ P((7-Tup), arul M{P) must be 
finite. A class Q, of sensitive relations is monotonic if Mi (Pi) C 
M2{P2)forall {Pi, Ml) r< (Pa, ATa) in fi. 



In this subsection and most parts of this paper, we study nonneg- 
ative linear queries for a monotonic class of sensitive relations. 

To obtain a differentially private answer to a query g on a relation 
T = M{P), it should specify how the relation T is affected by its 
contributors P. In particular, we want to know for each tuple in T 
the condition of its presence if some participants may opt out. The 
definition of the function M is too general to be efficiently han- 
dled in practice. Therefore, we propose to represent M as a c-table 
or A'-relation R, where each tuple t is annotated with a positive 
Boolean expression R{t) that specifies its condition of presence. 
Each variable p in an expression indicates whether the participant 
p £ P would contribute its data. A sensitive relation represented 
as a A'-relation is called a sensitive ii'-relation, denoted by (P, R). 

For a query g, an algorithm may first transform the original sensi- 
tive database (P, M) into a sensitive A'-relation (P, R) in a flexible 
way. For the correctness of the differentially private mechanism, 
however, the transformation should guarantee that for any neigh- 
boring sensitive databases the corresponding sensitive A'-relations 
are also neighboring. The concept of neighboring for sensitive K- 
relations is defined by as follows. 

Definition 14. Given an equivalence relation ~ on K, two 
sensitive K -relations (Pi, Pi) and (Pa, Pa), where Pa = Pi U 
{p}, p ^ Pi, are neighboring ifRi{t) ~ Pa(i)|p^.Faisc/of a// 1 £ 
(7 -Tup, where Pa (i)|p-+ False denotes an operation that replaces 
all occurrences of the variable p in Pa(i) with constant False. 

An issue in the above definition is that it does not specify what 
kinds of Boolean expressions in K are equivalent. A necessary 
condition for two expressions being equivalent is that they must 
yield the same truth-value for all valuation of variables. The way 
we write the expressions may, or may not matter, depending on 
the particular algorithms being used. For example, the inefficient 
mechanism presented in Sec. |4.2| is independent of the form of ex- 
pressions, so expressions that yield the same truth table are equiva- 
lent. On the other hand, the efficient mechanism presented in Sec.lS] 
relies on the way we write an expression. We will discuss this in 
Sec.H] 

In Fig. [2] we present simple examples of A'-relations that are 
produced by different queries to a graph. Fig. [2ta) is a subgraph 
counting, while Fig.[2{b) is a more complicated query. 

Finally, we introduce a variant of empirical sensitivity, which is 
relevant to the error bound of our mechanism. 

Definition 15 (Impact). For a sensitive K-relation 
(P, R) and a participant p £ P, the impact ofp at R is 



impact(p, P) = {t : R{t) qb R{t)\j,^^^x^^} 



(8) 



Definition 16 (Universal Empirical Sensitivity). 
For a sensitive K-relation (P, P), a participant p £ P and a 
nonnegative linear query q, the universal empirical sensitivity of q 
for a participant p at R is 



US,(p,R)= J2 9W 

t£ impact (p,i?) 



(9) 



For a sensitive K-relation (P, R) and a nonnegative linear query 
q, the universal empirical sensitivity ofq at (P, R) is 



USq{P, R) = ma.xUSg{p, R) 



(10) 



When g(i) = 1 for all t, USq{p, R) measures how many tuples 
in R have p appearing in their annotated expressions. The error 
bound of our mechanism presented in Sec.Blis roughly proportional 
to the universal empirical sensitivity USq. 
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4. THE RECURSIVE MECHANISM 
FRAMEWORK 

In this section, we first present the framework of a novel differ- 
ential privacy mechanism, recursive mechanism, which can answer 
any monotonic queries on any sensitive databases. Then, we give a 
general but inefficient implementation of the mechanism. 

4.1 The Basic Framework 

Our mechanism is based on two special sequences, 
Ho{P,M)---H\p\{P,M) and Go(P, M) • • ■ G|p|(P, M), 
as functions of the sensitive database {P, M) in Q. We call H a 
recursive sequence, which should satisfy the conditions given by 
the following definition. 

Definition 17 (Recursive Sequence). A sequence, 
Hi){P,M) . . . H\p\{P,M), as a fimction on Q,, is called a 
recursive sequence if the following conditions hold: 

• Ho{P, M) = Q for all {P, M) € Q 

• (Recursive Monotonicity) Hi{P2,M2) < Hi{Pi,Mi) < 
H,+i (Pa, Ma) for all neighboring (Pi , Mi) < (Pa, Ma) 
in fl andO < i < |Pi| 

We call G a bounding sequence of H, which is also a recursive 
sequence but satisfies some additional condition. 

Definition 18 (Bounding Sequence). For a recursive 
sequence H and g > 1, a sequence, Go{P, M) . . . G\p\ (P, M), 
as a function on Q,, is called a g-bounding sequence of H, if the 
following conditions hold: 

• G is a recursive sequence 

• Hj{P,M) < H,{P,M) + {\P\ - i)Gk{P,M) for all 
{P,M) e fi and all < i < j < \P\ and k = 

\P\-l{\P\-3)/9\ 

If g — 1, we simply say G is a bounding sequence of H. 
The framework of our mechanism consists of three steps 

1. For a monotonic query q, we construct a recursive se- 
quence H and a g-bounding sequence G of H such that 
H^p\{P, M) = q{M{P)) for all (P, M) G Q,. 

1. Based on G, find a quantity A such that A approximates 
G|p| (P, M) or the empirical sensitivity of q, and In A has 
low global sensitivity, then we add multiplicative noise to A, 
obtaining A, which satisfies differential privacy. 



3. Based on H, find a quantity X such that X approximates 
the true answer H\p\{P,M), and X has global sensitivity 
A, then we add Laplace noise to X, obtaining X, which 
satisfies differential privacy. 

The concrete construction of H and G are omitted here. We 
focus on Step 2 and 3 in this subsection. In the remainder of this 
paper, we will omit the argument (P, M) when the context is clear. 

For a sensitive database (P, M) and parameters /? > and Q > 
0, we compute A as following 



A = min{e'''e : G|p|_, < e^9} 



(11) 



We can observe several important properties of A. In the sequel, 
all proofs of lemmas and theorems are moved to the appendix. 

Lemma 1. GS\nA < /3. 

Lemma 2. A < max{6i,e''G|p|}. 

Lemma 3. G|p|_i„(A)/^ < A. 

Because In A has low global sensitivity, we can add Laplace 
noise to In A to obtain a noisy version A that satisfies differen- 
tial privacy. For parameter ei > and /i > 0, we compute 
A = e^'+^A, where Y ~ Lap(/3/ei). This finishes Step 2, and A 
has several properties. 

Lemma 4. The release of A satisfies ei-dijferential privacy. 

Lemma 5. Pr[A > e^'+'^A] < ^e'"'^/'^ for any O 0. 

Lemma 6. Pr[A < A] < le"^"!/". 

In Step 3, we first find a quantity X such that X approximates 
H\pi (P, M) and GSx < A. We compute X as 

X = min{J/, + (|P| -z)A :0<i< IP]} (12) 

We have several properties of X. 

Lemma 7. For any fixed A > 0, GSx < A. 

Lemma 8. If A > A, then H.p.^^^^yp < X < H\p\. 

For parameter ea > 0, our mechanism releases X = X + Y, 
where Y ~ Lap(A/ea). We give the privacy and utility guarantees 
in the following theorem. 



Theorem 1. For parameters ei > 0, £2 > 0, ^ > 0, 
6 > and fi > 0, recursive mechanism, as described above, 
satisfies {t\ + e2)-differential privacy, and is (e^^A*c/e2 + 
g\ln{^)/l3'\G\P\,e-^'^^^^ + e-^yaccurate for any c > 0, 
where A* = maxje, e'^G|p|}. If ei = e(e), ea = e(e), 
/? = ei/fc, and ^ and fi are constants, then the mechanism is 
(0(fcln(G|p|)G|p|/e),2e~*'^)-accMrate as e — > 0,fc — > 00 and 
G|p| — >■ 00. 

The error bound of recursive mechanism is roughly proportional 
to G|p|. Hence, the most important thing in a concrete implemen- 
tation of recursive mechanism is to find sequences H and G with 
G|p| as small as possible. 

4.2 A General but Inefficient Implementation 

Now we present a general but inefficient implementation of the 
recursive mechanism, which can answer any monotonic queries on 
sensitive databases. For a monotonic query q, we construct H and 
G as follows: 



H^{P,M)= min q(M'(P')) (13) 

{P',M')^{P,M),\P'\=i 



G^{P,M) 



min GS„{P',M') 

(P' ,M')^(P,M),\P'\=i 



(14) 
(15) 



Then we can show that the above H and G are what we want. 

Theorem 2. The sequence H is a recursive sequence, and the 
sequence G is a bounding sequence of H. 

Because G|p| (P, M) — GSq {P, M), the error bound of recur- 
sive mechanism using these H and G is roughly proportional to 
the global empirical sensitivity of q. The main disadvantage of this 
implementation is the expensive computation cost for H and G. 

5. EFFICIENT RECURSIVE MECHANISM 

In this section, we present an efficient recursive mechanism, 
which takes polynomial computation cost and can answer linear 
queries on sensitive JiT-relations. 

5.1 Recursive Mechanism with Relaxation 

The central idea of the efficient recursive mechanism is relax- 
ation, which introduces a mapping 4> : K ^>- [0,1]''^'^' that 
maps each Boolean expression in K into a [0, l]-valued expression 
cj>k '■ [0, 1]^ — >■ [0, 1]. The detail of 4> will be discussed in the next 
subsection. Now, we first give some required properties of <^. 

For simplifying notations, we let True = 1 and False — 0. For 
f : P ^ [0, 1], we define |/| = Y.^ f(p). By / < 5 we mean 
f{p) ^ g{p) for all P- The mapping has the following properties. 

Correctness For any k £ K and any Boolean assignment f : 

Naturalness For any k £ K, any real assignment / : P — )■ [0, 1] 

and any p e P, if f{p) = 0, then 4>k{.f) = <^fc|p^F.i=c(/)' 
and if /(p) = l,then<^fe(/) = 0fc|,^T_(/). 

Monotonicity For any k £ K and any real assignments f, g : 
P ^ [0, 1], if / < (?, then ,^fe(/) < 0fc(g). 

Convexity For any fc G A", i/ifc is a convex function. 



Truncated Linearity Define ?/>(a::) — min(l,a::) and(jifc(/) = 1 — 
(t)k{l -ipo f). For any k€ K,f : P ^ [0, 1] and c > 1, 
0ft (c/) = min(l,c<^;^(/)) 

Then, we introduce the notion of equivalence — two Boolean 
expressions in K are equivalent if their relaxed functions under 
are the same. This completes Definition [14] for neighboring sensi- 
tive iC'-relations. 

Definition 19 (Equivalence). For any k\,k2 G K, ki 
and k2 are equivalent, denoted by fci ~ ^2, ifcpki = 4'k2- 

Equivalence of two expressions implies that they yield the same 
truth table. But expressions that yield the same truth table are not 
necessarily equivalent. We will explain this in the next subsection. 

Provided a nonnegative linear query q : U -Tup — > R and map- 
ping (/), we construct the recursive sequence H as 

H,{P,R)= min Vg(f)0p(t)(/) (16) 

/S[0,l]^,l/l=i^ 

Note that the sum is finite since R has finite support. 

Theorem 3. The sequence H is a recursive sequence, and 
ff|P|(P,P) = g(supp(P)). 

To construct the bounding sequence of H, we also require that 
an auxiliary quantity St.p is provided for each k £ K and p £ P, 
which bounds the maximum change of (pk (/) caused by a small 
change to f{p). Formally, for all f,g £ [0, 1]^, ii f < g, and 
/(p') = 9{P') for allp' G P - {p}, then 

M9)-Mf)<i9iP)-f{p))Sk,p (17) 

Sk,p can be seen as the upper bound of the partial derivative of 0^ 
w.r.t. p. We call Sk.p the (^-sensitivity of the expression k for p. 
We can observe the following fact. 

Lemma 9. For any f < g in [0, 1]^, and any k £ K, 

Mg)-Mf) < y,{9{p)-f{p))Sk,p < l<7-/!maxSft,p (18) 
p 

Assuming that all ^-sensitivities Sk,p are known, we construct a 
2 -bounding sequence G of _ff as 

G,(P,P) = 2 min max Vg(t)0fl(t)(/)Spw.p (19) 
/e[o.i]P,|/|=i psp ^ 

Theorem 4. The sequence G is a 2-bounding sequence of H. 

5.2 The Mapping 

Here we discuss the mapping (p, the issues about annotation 
of Boolean expressions, and the utility guarantee of the recursive 
mechanism. For an expression k, we define (pk in a recursive way, 
as follows: 

• 0Faisc(/) = and (f>Truc{f) = 1 for all / 

• 0p(/) = /(p)forallpGP 

• 0^Aj/(/) = max{0, (/):,(/) + (l)y{f) - 1} and (l>^vy{f) = 
inax{(jf>a:(/), (l>y{f)} for all expressions x and y 

It can be shown that the above is just what we need. 

Theorem 5. The mapping (j), defined above, have the desired 
properties of correctness, naturalness, monotonicity, convexity, and 
truncated linearity. 



The output of mapping <j) is invariant under certain kinds of trans- 
formations of the input expressions. 

Identity ^j^atiug = (fix, ^i^vPaisc = (px 

AnnihilatOr (;ia;AFalsc = <?!>Falsc, ^xVTruc = 0Truc 



Associativity 



^xA(yAz) 



\xAy)Az! <PxV(yVz) 



(xVy)Vz 



Distributivity of A over V 



'>xA(yVz) 



(xAy)\/{xAz) 



Two expressions are equivalent if one can be obtained from another 
via a series of above transformations. Because is defined recur- 
sively, the above transformations can be applied to any place of an 
expression k without changing (f>k- 

Before invoking our mechanism, one needs to first generate a 
sensitive iiT-relation from the sensitive database and then issue a 
monotonic query. To satisfy differential privacy, it is important 
to ensure that for any neighboring sensitive databases the result- 
ing sensitive iiT-relations are still neighboring, according to Def- 
inition [14] Hence, when we annotate tuples with expressions that 
specify their conditions of presence, we should take care of the way 
we write the expressions. Specifically, if a tuple t is annotated with 
expression k, then we should ensure that when any participant p 
opts out, the new expression k' annotated with t can be obtained 
from fe|p_>Faisc via a series of invariant transformations. If this is 
guaranteed, then we say the annotation is safe. Fortunately, safe 
annotation is often easy to achieve. For positive relational algebra 
queries, the annotation provided in Sec. |2.4| is always safe. More- 
over, two expressions in disjunctive normal form are equivalent if 
and only if they produce the same truth table. Therefore, if we 
always expand all expressions into disjunctive normal form, then 
the annotation is always safe. 

The cjf>-sensitivities Sk.p, which bound the partial derivative of 
(jfife w.r.t. p, are also computed in a recursive way 

• Stiuc.p ~ <S'paisc,p ~ and Sp^p = 1 

• ^xAy,p ^ ^x,p I ^y,p and Ox\/y,p ^ rna,x-[Ox.p, ^y,pj 

We can observe several properties of ^-sensitivities: 1) Sk,p is 
not greater than the number of occurrences of p in expression k; 
2) Sk,p is at most one plus the the number of occurrences of A 
in fc; 3) if k is written in disjunctive normal form (e.g., the case 
of subgraph counting), then Sk,p < 1; 4) for positive relational 
algebra query, if each tuple in the input tables is associated with at 
most one participant, and we use the approach described in Sec. |2.4| 
to annotate the tuples in the output table with expressions, then 
Sk.p is at most one plus the number of operations in the positive 
relational algebra query. In Fig. [3] we present several examples of 
(/)-sensitivity. 

If we take the maximum S of Sk,p over all k G {R{t)} and 
p € P, then we can find that G|p|(P, i?) < 25" ■ USq{P,R). 
Hence, we conclude that the error bound of our mechanism is 
roughly proportional to S times the universal empirical sensitivity 
of q. In general, S is linear in the length of the positive relational 
algebra query. If all expressions are converted to disjunctive nor- 
mal form, then S is just a constant 1 . In particular, for subgraph 
counting we have USq — GSq = LSq. Thus the error bound is 
roughly proportional to the local empirical sensitivity of q. 

5.3 Computation Cost 

Note that the computation for each Hi and d can be encoded 
into a linear program with 0{L) variables, where L denotes the 
total length of all annotated expressions R{i) for t £ supp(_R). 
Therefore, our mechanism can run in polynomial time. 
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Figure 3: Examples of (^-sensitivities 



A simple algorithm that computes all Hi and d will need to 
solveO(|P|) linear programs. We can improve this by utilizing the 
monotonicity of G and the convexity of H. 

Lemma 10. (Convexity ofH) Hi+i — Hi < Hi+2 — Hi+ifor 
allO<i< \P\ -2. 



G 



\P\~i 



< e^PQ}, then A = e- 

,G 



jPf 



Let j = argmin{e-"^6 

We can observe that j = ln(f )//3 < 1 + ln(^)//3. Hence, 
A can be computed with access to the last 0(ln(G|p|)//?) entries 
of G. Furthermore, because G|p|_j — e-* 9 is monotonously de- 
creasing, we can use binary search to find j, with access to only 
0(ln(ln(G|p|)/^)) entries of G. 

Given A and A, we then compute X — Hi + {\P\—i)A, where 
i = a.Tgmm{Hi + {\P\ - i)A : < i < \P\}. To do this, we 
compute 



argmini,g[(,_IP| 



.H,, + {\P\-i')A 



(20) 



In the above formula, the range of i' is a real interval rather than 
an integer, and the definition of Hi/ is the same as Eq. 16 So i' 
can be computed by solving a linear program. Due to the convexity 
of H, we also know that [i'J < i < \i'] . Hence, i can then be 
computed with access to only two entries of H. 

Theorem 6. Efficient recursive mechanism can run in 
0(ln(ln(G|p|)//3)T(L)) time, where T{L) denotes time needed 
to solve a linear program with 0{L) variables, and L denotes the 
total length of all annotated expressions R{t) for t G supp(_R). 

6. EXPERIMENTAL EVALUATION 

In this section, we empirically evaluate the performance of our 
mechanism. We first compare our mechanism with existing mech- 
anisms for answering subgraph counting queries, then we use our 
mechanism to process more general iiT-relations. 

For each experiment, we generate several different graphs (or K- 
relations) by random, and for each graph we run every mechanism 
many times to obtain a series of answers. We measure the accu- 
racy of mechanisms by median relative error, that is, the median of 
the ratios between the absolute errors and the true answers. This 
measure of accuracy is consistent with the work |7J. 

6.1 Subgraph Counting 

For subgraph counting, we compare the accuracy of our mecha- 
nism with the following existing mechanisms: 

Local sensitivity mechanisms include the triangle algorithm of 
1 10|, the fc-star algorithm and the fc-triangle mechanism of |7|. All 
algorithms are based on the local sensitivity of the query. The k- 
triangle algorithm achieves only (e, 5) -differential privacy, while 
the others can achieve e-differential privacy. 

RHMS mechanism of (12| can process subgraph counting for any 
connected subgraphs. It achieves only (e, 7) -adversarial privacy for 
a specific class of adversaries. 
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Figure 4: Comparing accuracy of different meclianisnis in various settings. 



We set e — 0.5 and S — 'y — 0.1, which follows the parameter 
setting of |7]lj Our mechanism can achieve e-differential privacy, 
which is much stronger than the corresponding (e, (5) -differential 
privacy and (e, 7) -adversarial privacy. We test two versions of our 
mechanism, one provides node privacy, and the other provides edge 
privacy. Because node privacy requires that the released answer 
must be insensitive to the change of one node and all of its incident 
edges, it needs to introduce noise of much greater magnitude into 
the answer. Note that all other mechanisms in comparison can only 
provide edge privacy. For our mechanism, we simply set = 1, 
/3 = e/5 and fj, = 0.5, and we set /x = 1 for node differential 
privacy. 

We first perform experiments on synthetic graphs that are gener- 
ated by random. We generate graphs with various numbers of nodes 
and average degree avgdeg. Each edge in the graph appears inde- 

'it is widely believed by researchers that to provide useful privacy 
guarantee S should be a negligible function of database size jTolfT] 
plflSJ (i.e., S is asymptotically smaller than any inverse polynomial: 
S = 1/|P|"''^'). However, the fc-triangle algorithm 171 yields too 
noisy answers for such small S. 
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Figure 5: Running time of recursive meclianism, avgdeg = 10. 



pendently with probability avgdeg /(|l^| — 1). The experimental 
results are presented in Fig.H] 

It can be observed that RHMS mechanism does not yield mean- 
ingful answers for triangle counting and 2-triangle counting. This 
is because that its error bound grows exponentially with the number 
of edges in the subgraph. In some experiments, the relative errors of 
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Figure 6: Sizes of real graphs and running time of our mecha- 
nism for triangle counting. 
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RHMS mechanism are extremely high and the curves do not show 
in the figures. Moreover, the errors of local sensitivity mechanisms 
are also too high to be useful for triangle counting and 2-triangle 
counting when the graph is very sparse, because the smooth upper 
bound of local sensitivity is often high (relative to the true answer) 
for triangle counting on sparse graphs. 

Our mechanism, when providing edge privacy (the same as other 
compared mechanisms), always yield the most accurate answers. 
When providing node privacy, our mechanism has high relative er- 
ror for 2-star counting and 2-triangle counting, this is because the 
change of one node can affect a large number of 2-stars and 2- 
triangles in the graph. Nonetheless, the relative error of our mech- 
anism decreases while the size of graph grows. 

In Fig. [5] we present the running time of our mechanism. Be- 
cause each matched subgraph found in the whole graph contributes 
a tuple into the J-i'-relation, the computation cost of our mechanism 
grows polynomially with the true answer. Since the average degree 
is fixed, the number of triangles and 2-triangles often decreases 
when the graph enlarges, hence our mechanism runs faster for large 
sparse random graphs. On the other hand, the number of 2-stars is 
roughly proportional to the number of nodes, so the running time 
of our mechanism grows with the graph sizaj 

We also evaluate the mechanisms on several real dataset|j Ex- 
perimental results are shown in Fig. [6] and [7] We can see that our 
mechanism are often superior to the other mechanisms. This vali- 
dates the practical usage of our mechanism. 

6.2 Processing Js:-Relations 

^When the degrees of nodes are large, the number of fc-stars and 
fc-triangles can grow exponentially with k. One may think that 
our mechanism has exponential computation cost in this situation. 
Actually, the algorithm can be improved by a clever construction 
of A'-relation, such that the size of /•i'-relation is asymptotically 
independent of k. Due to limitation of space, we cannot present the 
details in this paper. 
'^ Available at http://www.cise.ufl.edu/research/sparse/matrices/ 
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Figure 8: Evaluating recursive mechanism on A"-relations with 
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Figure 9: Evaluating recursive mechanism on A'-relations of 
various sizes, each expression has 3 clauses. 



Finally, we evaluate the performance of our mechanism for pro- 
cessing more general queries. Because there are many different 
kinds of positive relational algebra queries, we directly generate K- 
relations that could be produced by some relational queries. In par- 
ticular, we consider two kinds of A'-relations: A'-relations in which 
every tuple is annotated with a 3-DNF Boolean expression, and K- 
relations in which every tuple is annotated with a 3-CNF Boolean 
expression. A 3-DNF A'-relation can be produced by a union of 
many join results, and a 3-CNF ii' -relation can be produced by a 
join of many unions of tables. We simply generate all expressions 
by random, but ensure that all annotated expressions have the same 
length. We also make |P|, the total number of variables, equal to 
I supp(i?)|, the size of the iiT-relation. We let q(€) = 1, that is, the 
true answer is just | supp(i?) | . The performance of our mechanism 
is shown in Fig.[8]and|9] We do not present experimental results 
for different kinds of g(t) because the curves are almost the same. 

The dotted curves in the figures denote the relative error if the 
absolute error exactly matches USq/e, where USq is the maxi- 
mum number of tuples that have at least one common participant 
appearing in their annotated expressions. The error of our mecha- 
nism is nearly linear in USq/e, as shown in the figures. The em- 
pirical sensitivity USq is insensitive to the increase of the number 
of participants and the number of tuples in R. Hence the relative 
error of our mechanism can gradually decrease if more data are 
available. In terms of computation cost, the running time of our 
mechanism grows polynomially with | supp(i?)| and the length of 
expressions. 



7. RELATED WORK 

Since differential privacy was introduced (2), it has gained con- 
siderable attention, and many techniques were developed for pri- 
vate data analysis. Dwork et al. L2J showed that differential privacy 
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can be achieved if we calibrate the noise to the global sensitivity 
of the query, and proposed the Laplace mechanism. The noise 
yielded by Laplace mechanism is independent of the database in- 
stance. Due to simplicity and wide applicability of Laplace mech- 
anism, many succeeding work for various query tasks were built 
upon Laplace mechanism. These include the relevant work (9l|ll| 
that studied relational algebra queries under differential privacy. 

Laplace mechanism fails to provide useful answers for queries 
that have large global sensitivity. Nissim et al. f 10| introduced the 
notion of local sensitivity, and proposed to calibrate the noise to a 
smooth upper bound of the local sensitivity. This leads to the idea 
of instance-dependent noise. They also gave algorithms for com- 
puting smooth sensitivity of triangle count as well as some other 
statistics in a variety of domains. However, there are no general 
way to compute the smooth sensitivity of a given query. 

Inspired by the work 1 1 1 , Karwa etal. f7 1 studied the problem of 
fc-star counting and fc-triangle counting, which were based on the 
local sensitivity of the query. Rastogi etal. 1 12| addressed counting 
of general subgraphs, which achieves utility better than global sen- 
sitivity based mechanism by relaxing the privacy guarantee. These 
approaches only provide edge privacy guarantee. 

There were also extensive studies on privacy in graph data be- 
yond the scope of differential privacy, but most do not provide 
qualitative privacy and utility guarantee. Readers can refer to the 
survey 1 14] for techniques that are based on fc-anonymity. 

8. CONCLUSION 

In this paper, we have presented a novel differentially private 
mechanism for releasing an approximation to a linear statistic of a 
table output by some positive relational algebra query to a database. 
It turns subgraph counting as a special case, and can provide guar- 
antee of either node differential privacy or edge differential privacy. 
Empirical evaluation shows that our mechanism can return more 
accurate answer than existing algorithms for subgraph counting, 
while achieving the same or even stronger privacy guarantee. 
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APPENDIX 
A. PROOFS 

Lemma 1. GS\nA < /3. 

Proof Sketch. For all neighboring (Pi, A/i) < {P2,M2), 
let j = argmini{e*''6l : G|Pi|_i(Pi, Mi) < e'^g} and j = 
argminj{e^''6' : G|P2|_j(P2, M2) < e^^9}, we show that 
* 5- j < 2 + 1- Note that G is a recursive sequence. Because 

g(»-i),3^ < G|p,|_(,_i)(Pi,Mi) < G|p,|_(,_i)(P2,M2), we 
havej > i. Similarly, because e^'+^'^e^ > G|Pi|_i(Pi, Mi) > 
G|P2|_(i+i)(P2,M2), wehavej < i + 1. D 

Lemma 2. A < max{6i,e''G|p|}. 

Proof Sketch. Suppose A = e^-^B. If i = 0, then A = 6^, 
otherwise, A = e^e'^-^^'^e < e^GiP,.; < e''G|p|. D 

LEMMA 3. G|p|_i„(A)/^<A. 

Proof Sketch. Suppose A = e"'6'. Then the lemma is true 

because ln(f)//3 = j. D 

Lemma 5. Pr[A > e^+'^A] < ^'"'^/^ for any O 0. 
Proof Sketch. 



Pr[A > e^'+^A] =Piv~Lap(,3/.i)[^ > c] 



(21) 



= PiV~Lap(i)[y >cei//^] (22) 

/'' (23) 



1 
= 2^ 



D 



Lemma 6. Pr[A < A] < le-f^i/". 

Proof Sketch. The same as the previous lemma. D 

Lemma 7. For any fixed A > 0, GSx < A. 

Proof Sketch. For all neighboring (Pi, A/i) ^ (P2,M2), 
let i = argminiii'i(Pi, A/i) + (|Pi| - i)A and j = 
argminj Hj{P2, M2) + (IP2I — j)^- Then, we have 

X{Pi,Mi) ^H^{Pi,Adi) + (|Pi| - i)A (24) 

<//,_i(Pi, A/i) + (|Pi| - (j - 1))A (25) 

<Hj{P2,M2) + {\P2\-j)A (26) 

=X(P2,M2) (27) 
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X(P2,M2) =Hj(P2,M2) + (IP2I - j)A (28) 

<H,{P2,M2) + {\P2\-i)^ (29) 

<i:f.(Pi,Mi) + (|Pi|-i + l)A (30) 

=X(Pi,Mi) + A (31) 



D 



Lemma 8. //A > A, then H. 



^|-9ln(f )//9 



A)/fl < ^ < -H'lPh 



Proof Sketch. The second inequality is obvious, so we show 
the first inequality. Let i = arg mini Hi + {\P\—i)A and suppose 
A = e^'^e, then 



-^1^1-9 ln(f)/,9 -H\P\-9J 



(32) 



(by the property of g-bounding sequence) 

<Hi + (|Pl-i)G|p|_, (33) 

<//, + (|P| - i)A (34) 

<H, + {\P\-i)A (35) 

=X (36) 



D 



Theorem 1. For parameters ei > 0, 62 > 0, /3 > 0, 
^ > and /i > 0, recursive mechanism, as described above, 
satisfies (ei + e2)-dijferential privacy, and is (e ^A*c/e2 + 
5rin(^)/^lG'|P|,e"'''i/'' + e'") -accurate for any c > 0, 
where A* = max{6l, e'^GiPj}. // ei = e(e), £2 = e(e), 
/? = ti/k, and 9 and /i are constants, then the mechanism is 



(0(fcln(G|p|)G|p|/e),2e 



-kii\ 



■accurate as e 



afc 



00 and 



G, 



Proof Sketch. The privacy guarantee is obvious since both 
the computation of A and X satisfy differential privacy. The utility 
guarantee is also true because 

we have A < A < 



-MilZ/S 



1) with probability at least 1 — e 

e^^A; 

2) with probability at least 1 — e~'^, we have \X — X\ < Ac/e2; 
3)if A> A,wehavelX-7?|p|| < (pln(f )//3)G|p| D 

Theorem 2. The sequence H is a recursive sequence, and the 
sequence G is a bounding sequence of H. 

Proof Sketch. For any neighboring (Pi, Mi) ^ (P2,M2), 
Pi U {p} = P2, and for any < i < |Pil, y G {1,2}, let 
Py — argmirip'cp ,|p'|=i q(My(P')). Then, because Ho = 
and 



H,{P2,M2 



q{M2{P2)) < q{M2{Pl)) 


(37) 


= q{Mi{Pl)) 


(38) 


= H,{Pi,M^) 


(39) 


q{M,{Pl)) < q{M,{Pl+' - {p})) 


(40) 


< q{M2(Pt')) 


(41) 


= ff.+l(P2,M2) 


(42) 



H is a recursive sequence. The same reasoning also applies to G 
being a recursive sequence. 

Now we show that G is a bounding sequence of H. For any 
< « < i < |P|,let A = argminp/cp,|p'|=»<?(M(P')),andlet 
B = argminp/cp,|p'|=iGS'q(P',M). Then 



<q(M{AnB)) + \B-A\GSg{B,M) (44) 

<q{M{A)) + {\P\- i)GS,{B, M) (45) 

=H, + (|Pj - i)Gj (46) 



D 



For the following proofs, we define f U g, f H g : P— >■ [0,1] by 
(/Uff)(p) = max{f{p), g{p)) and {fng){p) = mm{f{p),g{p)). 
We also define fp : P ^^ [0, 1] as an indicator function that has 
fp (p) = 1 and fp (p') = for all p' / p, and let fp, = J2peP' fv- 

Theorem 3. The sequence H is a recursive sequence, and 
H^p^{P,R)^q{suMR))- 

Proof Sketch. H^p^ (P, R) = q{snpp{R)) is obvious due to 
correctness of (f>. We will show that H is a. recursive sequence. 

For any neighboring (Pi, Pi) ^ (P2,P2), Pi U {p} = 
P2, and for any < i < |Pij, y £ {1,2}, let /^ = 
argmin^gjQ^P^I^I^- J]^(7(i)0p^(t)(/). Then, we have i/o = 
and 



H,{P2,R2)=Y.l(^)^R2w{m 

t 
<Y,q{t)4>R,(t){fl) 

t 

(naturalness of 0) 



(47) 
(48) 

(49) 



H, <q{M{B)) 



(43) 



(Pi(i) and P2(i)|p^Faisc are equivalent) 

= ;^g(i)<^fl,(i)(/9 (50) 

t 
=H.(Pi,Pi) (51) 

Due to 1/2^^ n (1 — fp)\ > i and monotonicity of <jf), we have 

i/.(Pi,Pi)=;^g(t)0p,(t)(/I) (52) 

t 

<;^g(t)0fli(t)(/2+'n(i-/p)) (53) 

t 

= ^<7(t)0p,(,)(/^+^n(l-/p)) (54) 
t 

<Y.qit)4>R,(,){f2+') (55) 

t 

=H,+i{P2,R2) (56) 

D 

Theorem 4. The sequence G is a 2-bounding sequence of H. 

Proof Sketch. The proof for G being a recursive sequence 
is the same as the proof for H. Now we show that for any < 
i < J < \PV we have Hj < H, + {\P\ - i)Gk, where k = 

1P1-L(1^1-J)/2J. 

Let h = argminftg[o_ip,|ft|^, Et'?W<?^flW('i)' 

g = a.rgmmg^^o^P^g^^^:2maxpJ2tq{i)<t>R{t){g)SR^t),p, 
and f{p) = max(0, 1 - 2(1 - g{p))) for all p G P. We first 
observe that, due to truncated linearity of 0, if <j)R(t) (/) > 0, then 
4>R(t){g) > 0.5. Thus, 

Hj<J2 lit)^R(t){f) (note that |/1 > j) (57) 

t 

<Y.q(t)^Rit){hr\ f) + 
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l/-ftn/lmax Y. '/W^flW.P (58) 



t-<l>R(t)(f)><-' 



(0flw(/)>O^0H(t)((7)>O.5) (59) 

<J2qmn(t){hnf)+ 
t 

1/ - ft n /I max^2g(t)<^H(t)(ff)5'«(t).p (60) 

<^9(t)0iJwW + (l^|-i)Gfc (61) 

t 

=H^ + {\P\-i)Gk (62) 



n 



Theorem 5. The mapping <f>, defined above, have the desired 
properties of correctness, naturalness, monotonicity, convexity, and 
truncated linearity. 

Proof Sketch. These properties can be easily proved by in- 
duction, so we omit the details. D 

Lemma 10. (Convexity ofH) Hi+i — Hi < Hi+2 — Hi+ifi>r 
allO<i< \P\ -2. 

Proof Sketch. First note that the function h{f) = 
J]j q{t)4'R(t)if) is convex, due to the convexity of 4>. Then, let 

f = argmin^g[o,i]P_|/l=i/i(/). Wehave 

H,+i^h{r+')<hiir + f+')/2) (63) 
(convexity of h) 

<{h{h + hif+''))/2 (64) 

={H, + H,+2)/2 (65) 

D 
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